Improving Non-Native ASR Through Stochastic Multilingual Phoneme Space Transformations
نویسندگان
چکیده
We propose a stochastic phoneme space transformation technique that allows the conversion of conditional source phoneme posterior probabilities (conditioned on the acoustics) into target phoneme posterior probabilities. The source and target phonemes can be in any language and phoneme format such as the International Phonetic Alphabet. The novel technique makes use of a Kullback-Leibler divergence based hidden Markov model and can be applied to non-native and accented speech recognition or used to adapt systems to underresourced languages. In this paper, and in the context of hybrid HMM/MLP recognizers, we successfully apply the proposed approach to non-native English speech recognition on the HIWIRE dataset.
منابع مشابه
Multilingual speech recognition A posterior based approach
Modern automatic speech recognition (ASR) systems are based on parametric statistical models such as hidden Markov models (HMMs), exploiting 1) acoustic-phonetic models, which need to be trained on large amount of acoustic data, 2) a language model, which needs to be trained on large amount of text data and, finally, 3) a lexicon with phonetic transcription which requires linguistic expertise. ...
متن کاملImproving ASR performance on non-native speech using multilingual and crosslingual information
This paper presents our latest investigation of automatic speech recognition (ASR) on non-native speech. We first report on a non-native speech corpus an extension of the GlobalPhone database which contains English with Bulgarian, Chinese, German and Indian accent and German with Chinese accent. In this case, English is the spoken language (L2) and Bulgarian, Chinese, German and Indian are the ...
متن کاملIntegrating Thai grapheme based acoustic models into the ML-MIX framework - for language independent and cross-language ASR
Grapheme based speech recognition is a powerful tool for rapidly creating automatic speech recognition (ASR) systems in new languages. For purposes of language independent or cross language speech recognition it is necessary to identify similar models in the different languages involved. For phoneme based multilingual ASR systems this is usually achieved with the help of a language independent ...
متن کاملImproving proper name recognition by means of automatically learned pronunciation variants
This paper introduces a novel lexical modeling approach that aims to improve large vocabulary proper name recognition for native and non-native speakers. The method uses one or more so-called phoneme-to-phoneme (P2P) converters to add useful pronunciation variants to a baseline lexicon. Each P2P converter is a stochastic automaton that applies context-dependent transformation rules to a baselin...
متن کاملImproving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery
Development of state-of-the-art automatic speech recognition (ASR) systems requires acoustic resources (i.e., transcribed speech) as well as lexical resources (i.e., phonetic lexicons). It has been shown that acoustic and lexical resource constraints can be overcome by first training an acoustic model that captures acoustic-to-multilingual phone relationships on languageindependent data; and th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011